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(54) [Title of the Invention] Device and Method for Voice Recognition 
(57) [Abstract] 

[Problem to be Solved] To lighten the burden of speaking on a user when a destination for 
navigation is vocally set. 

[Solution] The user inputs destination data for navigation through a microphone 10. A 
control unit 14 stores the input spoken data in a spoken data storage unit 22 and analyzes at 
least part of the input spoken data by using a voice database 18. The voice database 18 
used for the analysis is changed by using the analysis result, and the spoken data stored in 
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the spoken data storage part 22 are read out to take an analysis again. Thus, the stored 
spoken data are used, so the user speaks only once to set the destination with high 
precision. 
[Claims for Patent] 

[Claim 1] A voice recognition device characterized by comprising: 

spoken data storage means for storing spoken data of a user; 

first voice analysis means for analyzing at least a portion of the spoken data by 
comparing the spoken data and voice data in a voice database; 

changing means for changing the voice database based upon analysis data obtained 
by the first voice analysis means; and 

second voice analysis means for re-analyzing the spoken data by reading out the 
spoken data stored in the spoken data storage means, and comparing the spoken data with 
voice data in the voice database changed by the changing means. 

[Claim 2] The voice recognition device according to claim 1, characterized in that the 
spoken data is destination data for navigation, and further comprising: 

means for processing predetermined data as a landmark for a destination when the 
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predetermined data is obtained by the second voice analysis means. 
[Claim 3] A voice recognition method characterized by comprising the steps of: 
storing spoken data of a user; 

a first analysis for analyzing at least a portion of voice data by comparing the 
spoken data of the user and the voice data in a voice database; 

changing the voice database based upon analysis data obtained by the first voice 
analysis; and 

a second analysis for re-analyzing by reading out the spoken data stored in the 
storing step, and comparing the spoken data read out with voice data in the voice database 
changed in the changing step. 

[Claim 4] The voice recognition method according to claim 3, characterized in that the 
spoken data is destination data for navigation, and further comprising the step of: 

processing predetermined data as a landmark for a destination when the 
predetermined data is obtained by the second analysis. 
[Detailed Description of the Invention] 
[0001] 
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[Technical Field of the Invention] The present invention relates to a device and method 
for voice recognition, and in particular, relates to voice recognition for setting a destination 
in a navigation system. 
[0002] 

[Related Art] Art has been proposed in the past for executing various processing by voice, 
such as setting a destination, in a navigation system. In such art, an important issue is how 
fast and accurate a voice spoken by a user can be recognized. Voice recognition is usually 
performed by comparing user spoken data and voice data in a voice database prepared in 
advance. A hierarchical voice database is often used. 

[0003] For example, an art is disclosed in Japanese Patent Laid-Open Publication No. 
10-62199, in which a voice database is separated into three layers. Layer 1 stores facility 
names with location information, and facility category names without location information. 
Layer 2 stores facility names with location information and prefecture names without 
location information, which correspond to the category names stored in layer 1 . Layer 3 
stores facility names with location information corresponding to the prefectures in layer 2. 
Voice recognition is performed by changing sequential layers in accordance with user 
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spoken data. 
[0004] 

[Problem to be Solved by the Invention] In the above related art, the layers of the voice 
database are changed each time the user speaks. Therefore, if the user wants to set the 
navigation destination as XX department store in a department store category, he or she 
must repeatedly say [facility] -> [department store] -> [XX department store] in that order. 
The destination could not be set by saying a natural phrase once, such as "I want to go to a 
department store, the XX department store". 

[0005] Furthermore, some users may want to set the destination based on a certain 
landmark object, such as a parking lot near the XX store, for example. However, it was 
impossible for the related art to recognize a destination setting based on a landmark object 
in this manner. 

[0006] In light of the problems found in the above related art, it is an object of the present 
invention to provide a device and method capable of more easily setting desired data by 
voice and lightening the burden of speaking on a user. 
[0007] 
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[Means for Solving the Problem] In order to achieve the above object, a first invention is 
characterized by including spoken data storage means for storing spoken data of a user; first 
voice analysis means for analyzing at least a portion of the spoken data by comparing the 
spoken data and voice data in a voice database; changing means for changing the voice 
database based upon analysis data obtained by the first voice analysis means; and second 
voice analysis means for re-analyzing the spoken data by reading out the spoken data stored 
in the spoken data storage means, and comparing the spoken data with voice data in the 
voice database changed by the changing means. Voice recognition can be surely 
performed after the user speaks once by analysis with the first voice analysis means, and 
further analysis again with the second voice analysis means after reading out stored voice 
data (with the voice database changed and optimized for re-analysis). It should be noted 
that the first voice analysis means and the second voice analysis means do not have to be 
separate, and both functions may also be achieved by the same means. 
[0008] A second invention according to the first invention is characterized in that the 
spoken data is destination data for navigation, and further includes means for processing 
predetermined data as a landmark for a destination when the predetermined data is obtained 
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by the second voice analysis means. By designating predetermined data obtained in a 
voice analysis as a landmark, for example, in spoken data "XX near YY", it is possible to 
obtain the actual destination "XX" after obtaining the predetermined data "near YY" using 
"YY" as a landmark. 

[0009] A third invention is characterized by including the steps of: storing spoken data of 
a user; a first analysis for analyzing at least a portion of voice data by comparing the 
spoken data of the user and the voice data in a voice database; changing the voice database 
based upon analysis data obtained by the first voice analysis; and a second analysis for 
re-analyzing by reading out the spoken data stored in the storing step, and comparing the 
spoken data read out with voice data in the voice database changed in the changing step. 
[0010] A fourth invention according to the third invention is characterized in that the 
spoken data is destination data for navigation, and further includes the step of processing 
predetermined data as a landmark for a destination when the predetermined data is obtained 
by the second analysis. 
[0011] 

[Embodiments of the Invention] Hereinafter, embodiments of the present invention will 
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be described with reference to the accompanying drawings in an example of destination 
setting in a navigation system. 

[0012] FIG 1 is a structural block diagram of the present embodiment, and a drawing 
showing the structure of a navigation system with a voice recognition function. 
[0013] A voice of a user (vehicle passenger) is input into a microphone 10, which is then 
supplied to a control unit 14. A present location detecting unit 12 includes a GPS, vehicle 
speed and direction sensors. The present location detecting unit 12 detects the present 
location of a vehicle, which it supplies to the control unit 14. 

[0014] The control unit 14 specifically includes a microcomputer. In addition to 
executing various controls required for navigation, the control unit 14 analyzes user spoken 
data input from the microphone 10 to set a destination. In the present embodiment, the 
control unit 14 functions as first voice analysis means and second voice analysis means. 
Moreover, the control unit 14 functions as changing means for changing data used for 
analysis inside a voice database 18. 

[0015] The voice database 18 has a hierarchical structure, and stores voice data for 
comparison with spoken data when the control unit 14 analyzes user spoken data. The 
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control unit 14 accesses the voice database 18 as necessary to analyze spoken data. In 
addition, the voice database 18 may be provided with a CD-ROM, DVD or the like. 
[0016] A map data storage unit 20 stores map data (map data for display and map data for 
route searches) required for navigation. The control unit 14 reads out map data in the 
vicinity of the detected present location from the map data storage unit 20, which is 
displayed on a display unit 24. Alternately, using map data for route searches, the control 
unit 14 searches for a route to a destination that was obtained by analyzing spoken data. 
The recommended route is then displayed on the display unit 24. Naturally, the 
recommended route may be reported by voice from a speaker. In addition, the map data 
storage unit 20 may be provided with a CD-ROM, DVD or the like. 
[0017] A spoken data storage unit 22 stores spoken data input from the microphone 10. 
By reading out spoken data stored in the spoken data storage unit 22, the control unit 14 can 
perform multiple analyses of spoken data without requiring additional speaking by the user. 
In addition, the spoken data storage unit 22 may be provided with a semiconductor 
memory. 

[0018] It should be noted that an operating unit 16 is used for various input operations, 
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such as scrolling map data displayed on the display unit 24, and manually setting a 
destination without speaking. 

[0019] FIG 2 shows the hierarchical structure of the voice database 18. The voice 
database has three layers: a national-level recognition grammar dictionary, a 
prefectural-level recognition grammar dictionary, and a municipal-level recognition 
grammar dictionary. Note that a "grammar dictionary" is a collection of voice data for a 
grammar method used in the analysis of spoken data by the control unit 14, and the 
grammar method will be described later. The national-level recognition grammar 
dictionary stores data for main places and names for all of Japan; the . prefectural-level 
recognition grammar dictionary is classified by each prefecture, and stores data for places 
and names within prefectures; and the municipal-level recognition grammar dictionary is 
classified by each city, and stores data for places and names within cities. 
[0020] The control unit 14 decides what data from which layer of the voice database 1 8 is 
read out and used, based upon the detected present location and the analysis result of 
spoken data. More specifically, if the present location of the vehicle is the city of Susono 
in Shizuoka Prefecture, for example, the control unit 14 specifies Shizuoka Prefecture as 
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the prefectural-level recognition grammar dictionary, and Susono and neighboring cities as 
the municipal-level recognition grammar dictionary within the voice database 18. If the 
vehicle then moves so that its present location is Chiyoda ward in Tokyo, the control unit 
14 specifies Tokyo as the prefectural-level recognition grammar dictionary, and Chiyoda 
ward and neighboring wards as the municipal-level recognition grammar dictionary. One 
advantage of specifying the voice database depending on the present location is a quick 
analysis and recognition ability when the vicinity of the present location is spoken as the 
destination. Furthermore, the control unit 14 changes the data used in the voice database 
1 8 depending on the analysis result of spoken data. For example, once it is known that the 
city of Mishima is a landmark according to the analysis result of spoken data, the analysis 
is continued after changing the municipal-level recognition grammar dictionary to 
Mishima. 

[0021] FIG 3 shows a processing flowchart of voice recognition according to the present 
embodiment. First, the user speaks to input a destination (S101). The user may say 
something such as, "XX store in Mishima City" or "I want to go to the parking lot near 
Mishima Station". This spoken data input from the microphone 10 is stored in the spoken 



11 



data storage unit 22 (SI 02), and the control unit 14 analyzes the input spoken data using a 
grammar method (SI 03). 

[0022] The grammar method is described here. In the grammar method, recognition is 

carried out through the predefinition of combinations of words to be recognized. For 

example, a sentence combination consists of <a><b><c>, where the candidates for <a> are 

"today", "tomorrow" and "the day after tomorrow"; the candidates for <b> are "the weather 

is" or "the weather will be"; and the candidates for <c> are "good" or "bad". In this case, 

sentences such as "today the weather is good", "today the weather is bad", and "tomorrow 

the weather will be good" are recognized; Word combinations (called phrases) such as the 

following are used in the present embodiment to recognize destinations. 

[0023] Basic phrase 1 = <end><place>? 

Basic phrase 2 = <end><name>?<NULL>? 

Basic phrase 3 = <end><name>?<NULL> in <place>? 

Basic phrase 4 = <end><name>?<NULL> in <name> in <place>? 

Basic phrase 5 = <end><name>?<NULL> in <name>? 

Basic phrase 6 = <end><direction><place>? 
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Note that <place> is a phrase that represents an address or area, where the address may be 
"Shizuoka" or "Shizuoka Prefecture" or the like, and the area may be "Izu" or "Bousou" or 
the like. <end> is a phrase that indicates the end of a sentence, such as "I want to go to", 
"take me to", "I want to stop at", "return to", "go to", "please", "to", "please go over to", 
and "over to". <NULL> is a phrase that represents a range or degree, such as "near", 
"around", "first", "closest to", "close to", "cheap", "good food", "delicious", "the usual", 
"close by", and "around here". <NULL> data is also data required when setting a 
landmark. <name> is a phrase that represents a name or a facility. Examples of <name> 
include "XX station", "XX parking lot", "XX golf course", "XX park", "XX interchange", 
"XX hospital", "XX harbor", "XX river", "XX tourist resort", and "XX hot springs". 
<direction> includes "toward XX" or the like. A question mark after a bracket indicates 
that the phrase in the brackets is not essential and may be ignored. Accordingly, the basic 
phrase 1 includes "Shizuoka" as well as "I want to go to Shizuoka". The examples 
mentioned earlier, "XX store in Mishima City" and "the parking lot near Mishima Station" 
correspond to the basic phrases 3 and 5, respectively. 

[0024] By using such a grammar method to analyze user spoken data, it is possible to 
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analyze at least a portion of the spoken data, although the rest of the spoken data (especially 
the latter half of spoken data) may not permit analysis. More specifically, when analyzing 
the above spoken data "XX store in Mishima City", the word "Mishima City" is in the 
national-level recognition grammar dictionary and can be analyzed. However, the name 
"XX store" cannot be analyzed unless the municipal-level recognition grammar dictionary 
is used; moreover, the spoken data cannot be analyzed if a city other than Mishima City is 
specified in the municipal-level recognition grammar dictionary (i.e., if the present location 
of the vehicle is Susono City, the default value of the municipal-level recognition grammar 
dictionary is Susono City). Thus, the grammar dictionary of the voice database 18 is 
changed using the result obtained from analysis (SI 04). Since "Mishima City" was 
obtained in the above example, the municipal-level recognition grammar dictionary is 
changed to data for Mishima City. 

[0025] After changing the voice database 18, the spoken data stored in the spoken data 
storage unit 22 in processing at S102 is read out and analyzed again (S105). At this time, 
the municipal-level recognition grammar dictionary is set to data for Mishima City, thus, 
the words "XX store" in the spoken data can be analyzed. Once the spoken data is 
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completely analyzed, the control unit 14 searches for the destination from map data using 
the analysis result (SI 06). In this example, map data for Mishima City is read out to 
search for XX store. 

[0026] Meanwhile, spoken data consisting of "I want to go to the parking lot near 
Mishima Station" can also be analyzed in the same manner in SI 03. In this case, the 
active recognition grammar dictionary (national-level in this case) can pick up and analyze 
the words "Mishima Station", "near", and "parking lot". Thereafter, the municipal-level 
recognition grammar dictionary is changed to data for Mishima City (SI 04). Next, the 
spoken data stored in the spoken data storage unit 22 is read out and analyzed again (SI 05). 
It should be noted that in the case of this example, the second analysis result is the same as 
the first analysis result because it was possible to analyze all the spoken data in the first 
analysis. Naturally, the words "XX parking lot" in the case of spoken data consisting of "I 
want to go to the XX parking lot near Mishima Station" cannot be analyzed in the first 
analysis. The "XX parking lot" portion can be analyzed in the second analysis after 
changing the voice database. Since there is also the <NULL> data "near", the control unit 
14 processes the analysis result of the <name> data before the <NULL> data as a landmark, 
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and searches for parking lots starting from those closest to the coordinates (X, Y) of the 
landmark (Mishima Station) in the map data (S 106). 

[0027] As described above, in the present embodiment, user spoken data is stored such 
that the voice database is automatically changed for re-analysis even when an analysis 
could not be performed the first time. Thus, the precision of voice recognition is 
improved, in addition to allowing the user to set the destination after speaking only once. 
[0028] In the present embodiment, if there is <NULL> data, data before <NULL> data is 
considered a landmark, based upon which a search is performed in map data. Hence, the 
actual destination can be found by searching map data in the vicinity of the landmark, 
thereby allowing the user to easily set a desired destination using natural vocalization. 
[0029] Furthermore, in the present embodiment, requesting more information from the 
user is suitable for improving the rate of recognition in the case of homonyms. For 
example, if the user says "Toyota", a question such as "Toyota City or Toyota Town?" may 
be output from a speaker. 

[0030] Moreover, when analyzing spoken data, art such as assigning annotation 
representing the types of obtained data to facilitate map database searches may naturally be 
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used in the present embodiment. For example, the numeral 11 can be assigned as 
annotation for a prefecture, 13 for a city name, 42 for an area, and 32 for a name in a place. 
In this case, while a numeral (i.e., 91) is assigned as annotation for <NULL> data such as 
"close to" and "near", no annotation should be assigned for <NULL> data such as "cheap" 
and "good food". This is because such words are not required for setting a destination 
(searching map data). 
[0031] 

[Effect of the Invention] As described above, according to the present invention, it is 
possible to set desired data, such as a destination for navigation, more easily by voice and 
lighten the burden of speaking on a user. 
[Brief Description of the Drawings] 

[FIG. 1] FIG. 1 is a structural block diagram of an embodiment. 

[FIG.2] FIG. 2 is an explanatory drawing of a voice database structure of the embodiment. 
[FIG. 3] FIG. 3 is a processing flowchart of the embodiment. 
[Description of Reference Numerals] 
10 MICROPHONE 
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12 PRESENT LOCATION DETECTING UNIT 

14 CONTROL UNIT 

16 OPERATING UNIT 

18 VOICE DATABASE 

20 MAP DATA STORAGE UNIT 

22 SPOKEN DATA STORAGE UNIT 

24 DISPLAY UNIT 

[FIG 1] 
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12/PRESENT LOCATION DETECTING UNIT 

14/CONTROLUNIT 

16/OPERATING UNIT 

18/VOICE DATABASE 

20/MAP DATA STORAGE UNIT 

22/SPOKEN DATA STORAGE UNIT 

24/DISPLAYUNIT 
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[FIG. 2] 

NATIONAL-LEVEL RECOGNITION GRAMMAR DICTIONARY 
PREFECTURAL-LEVEL RECOGNITION GRAMMAR DICTIONARY 
MUNICIPAL-LEVEL RECOGNITION GRAMMAR DICTIONARY 
[FIG. 3] 
START 

S101/USER SPEAKS 
S102/RECORD USER VOICE 

S 103/USE GRAMMAR FOR VOICE RECOGNITION / GRAMMAR ANALYSIS 
S104/CHANGE VOICE RECOGNITION GRAMMAR DICTIONARY USING 
ANALYSIS RESULT 

S105/USE RECORDED VOICE FOR RE-RECOGNITION / GRAMMAR RE-ANALYSIS 
S106/SEARCH FOR DESTINATION. IF LANDMARK USED, SEARCH IN ORDER 
STARTING FROM x, y COORDINATES OF LANDMARK 
END 
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